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This  thesis  analyzes  the  use  of  a  discrete  Fourier 
transform,  a  template  and  a  frequency  filter  as  scene 
analysis  tools.  That  analysis  leads  to  a  problem  formulation 
that  permits  mathematical  optimization  of  the  frequency 
filter.  The  problem  is  then  recast  to  include  the 
optimization  of  the  transform  and  template  as  well.  To 
permit  straight  forward  synthesis,  an  alternate  quadratic 
cost  function  is  developed,  and  the  minimization  of  the  cost 
is  reduced  to  a  set  of  linear  equations. 

The  relative  merit  of  norm  and  dot  product  correlations 
and  the  expected  performance  of  first  and  second  order 
systems  is  discussed.  Application  to  reading  text  is 


referenced  throughout. 


In  scene  analysis  we  have  a  pattern  of  light  intensities 

that  is  the  result  of  light  reflecting  from  being  emitted 

from  some  object  or  set  of  objects.  The  set  of  objects  is 

called  a  scene.  The  task  of  scene  analysis  is  to  locate 

and/or  identify  some  objects  from  the  pattern  given.  The 

task  can  be  extended  to  three  dimensions  and  include 

location,  identification,  rotation  and  size.  That  is  not 

considered  in  this  study.  The  term  image,  in  this  study, 

refers  to  a  specific  realization  of  the  light  intensity  from 

a  scene.  That  realization  is  the  light  intensity  at  each 

point  of  a  N  by  N  grid  points,  and  can  be  considered  a  vector 
2 

in  Rn  space.  The  object  of  interest  is  referred  to  as  the 
object  and  is  the  digitization  of  the  light  emitted  or 
reflected  from  the  object  of  interest  in  the  scene.  The 
light  intensity  due  to  all  other  objects  is  referred  to  as 
clutter.  One  popular  technique  for  scene  analysis  is  to  take 
some  expected  pattern  of  light  intensities  for  the  object  and 
compare  that  with  the  light  intensities  in  the  image.  The 
expected  pattern  of  light  intensities  is  referred  to  as  the 
template.  In  this  study  the  template  is  taken  to  be  the 
digitization  of  the  light  from  the  object  with  no  clutter. 
There  is  also  noise  associated  with  the  object  that  is  a 
result  of  3-dimensional  rotation  of  the  physical  object, 
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change  in  lighting  or  any  other  possible,  variations  in  the 
scene.  We  also  refer  to  an  average  template.  That  templatef 
as  suggested,  is  the  average  of  a  collection  of  templates  of 
a  particular  object. 

A  proposed  technique  for  scene  analysis  makes  use  of  a 
discrete  Fourier  transform  (DFT) ,  a  template  and  a  frequency 
filter  (Ref:l-3).  The  author  analyzed  a  particular  algorithm 
of  that  type  and  reports  that  analysis  in  AFIT  TR-81-1. 
The  algorithm  analyzed  is  an  algorithm  proposed  by  Moshe 
Horev,  a  Major  in  the  Isreali  Air  Force  (Ref:3).  The 
empirical  part  of  that  analysis  required  an  algorithm  to 
transform  the  rectangular  coordinate  array  of  light 
intensities  to  a  polar  array  of  light  intensities.  A  point 
by  point  conversion  previously  used  was  slow  and  the  author 
developed  an  new  algorithm  to  increase  the  efficiency  of  that 
coordinate  transformation.  That  algorithm  makes  use  of  the 
properties  of  an  array  processor  and  a  large  data  storage 
disk.  The  details  of  that  algorithm  are  contained  in  an  AFIT 
TR-81-2. 

The  proposed  procedure  for  analyzing  a  scene  is  to  form 
an  average  template,  take  the  DFT  of  both  the  unknown 
template  and  image,  filter  both  the  template  and  the  image  in 
the  frequency  plane  and  then  compare  using  either,  the  norm 
of  the  difference  between,  or  the  dot  product  of  the  two 
vectors.  This  study  initially  starts  out  formulating  a 
problem  statement  so  that  a  mathematically  optimum  frequency 
filter  can  be  synthesized.  That  analysis  suggests  the 
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plausibility  of  optimizing  not  only  the  filter  but  the 
template  and  the  transform  as  well  (the  DPT  may  not  be 
optimum) . 

A  subtle  but  significant  change  in  the  point  of  view 
occurs  and  is  detailed  in  section  2.5.  The  study  starts  out 
investigating  the  concept  of  analyzing  a  scene  by  using  the 
DPT,  a  set  of  templates  and  a  frequency  filter.  The  concept 
changes  until  in  the  final  problem  statement  and  solution  the 
concept  is  to  take  a  set  of  images  (vectors  in  a  vector 
space,  referred  to  as  a  data  space)  whose  analysis  is  known 
(a  human  does  that),  and  then  to  translate  the  results  of 
that  analysis  into  vectors  in  a  second  vector  space  (one 
dimension  for  each  result  of  interest,  referred  to  as  the 
destination  space).  The  final  step  is  to  optimize  a  mapping 
of  the  first  space  into  the  second.  The  view  is  radically 
different  from  the  starting  point  but  produces  an 
optimization  problem  that  can  be  solved. 

The  change  is  more  fundamental  than  it  may  at  first 
appear.  Normally  a  problem  is  approached  in  terms  of  a 
model,  in  other  words,  a  model  is  developed  based  on  how  the 
data  has  been  observed  to  interact  (or  based  on  physical 
laws).  The  system  is  then  optimized  based  on  that  model  and 
the  system  is  run  determine  if  satisfactory  performance  has 
been  acheived.  The  approach  developed  here  derives  a  mapping 
by  requiring  that  the  results  of  the  mapping  optimally  match 
the  desured  (and  known)  outputs.  This  optimum  mapping  is 
calculated  for  a  limited  sample  (hopefully  representative  of 
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any  data  that  might  be  encountered)  with  the  assumption  that 
other  data  will  be  mapped  as  desired.  That  should  be  the 
case  if  the  data  used  to  construct  the  mapping  represent  a 
sufficient  cross-section  of  the  possible  input  data.  Note 
that  this  concept  is  not  limited  to  the  scene  analysis 
problem. 

Finally  the  objective  of  this  optimization  and  problem 
formulation  is  to  solve  some  scene  analysis  problem 
rigorously.  The  problems  that  were  encountered  when  letter 
identification  was  attempted  (AFIT  TR-81-1)  and  the  resulting 
frustration  prompted  this  study  so  that  a  working  solution  to 
some  scene  analysis  problem  could  be  synthesized.  The 
attempt  is  to  start  out  with  a  workable  problem  and  to 
develop  insights  into  the  characteristics  of  this  problem 
formulation  and  then  theorize  about  how  to  approach  the  more 
complicated  problems  using  those  characteristics.  The  simple 
problem  that  is  discussed  is  the  reading  of  uppercase 
handprinted  text.  That  is  even  further  simplified  to  the 
problem  of  identifying  isolated  handprinted  uppercase 
letters.  The  key  is  that  scene  analysis  is  a  very 
complicated  subject  and  the  effort  here  is  formulate  a  simple 
solvable  problem. 


II.  THEORY 


The  first  idea  developed  is  to  optimize  the  frequency 
filter.  A  formal  statement  of  that  objective  is  given  in 
section  2.1.  The  assumptions  to  make  the  mathematical 
statement  simple  and  clear  are  discussed  in  section  2.2. 
Sections  2.3.1  and  2.4.1  state  the  optimization  problem  and 
simplify  the  equations  as  much  as  possible.  That  statement 
suggest  it  might  not  be  much  more  difficult  to  optimize  not 
only  the  filter  but  the  template  and  the  transformation  as 
well  (the  DFT  may  not  be  optimum).  That  problem  is  developed 
in  sections  2.3.2  and  2.4.2.  The  results  of  optimizing  all 
three  elements  of  the  correlation  procedure  suggest 
elimination  of  the  correlation,  template,  and  filter 
completely  and  optimizing  only  a  mapping.  The  remainder  of 
the  chapter  approaches  the  problem  from  that  new  point  of 
veiw, 

2.1  Problem  Statement 

The  problem  considered  in  this  chapter  is  the  task  of 
locating  and  indentifying  any  one  of  T  objects  in  a  cluttered 
scene.  The  particular  problem  referenced  throughout  this 
chapter  is  reading  text.  A  restriction  on  the  problem  is 
that  the  text  is  made  up  of  only  uppercase  block  letters. 
That  problem  has  26  classes  of  objects^  A  set  of  test 
objects  is  also  referenced  in  the  text.  That  set  is  made  up 


K>\. 

Jfrr-PIA.'--*1- 


£ 


5 

/ 


of  C  samples  from  each  class  of  objects.  The  statement  of 

the  problem  is 

(1)  There  are  T  classes  of  objects  that  are  of  interest. 

(2)  Given  are  C  samples  from  each  class  of  objects. 

(3)  Given  are  P  pages  of  text  with  known  letter  location 
identification. 

(4)  One  problem  is  to  learn  to  identify  isolated  letters. 

(5)  A  second  task  is  to  learn  to  locate  the  center  of  letters 
in  text. 

(6)  A  third  problem  is  to  identify  letters  that  have  been 
located  in  text. 

(7)  A  practical  but  questionable  assumption  is  made;  that 
is,  it  is  assumed  that  the  text  given  is  representative 
of  the  of  the  real  text  that  .will  be  encountered  and 
results  from  the  text  given  will  extrapolate  to  other 
text.  In  other  words,  the  algorithm  is  trained  on  a 
limited  set  and  expected  to  perform  on  the  universal 
set. 


The  problem  statement  seperates  the  location  and 
identification  task.  That  seperation  may  not  be  necessary 
but  it  is  an  integral  part  of  this  problem  statement.  The 
assumption  is  made  because  putting  both  into  a  single  cost 
function  will  compromise  each.  The  joint  cost  function  is 
much  more  complex  to  formulate  and  we  are  looking  for  a 
simple,  solvable,  optimizable  problem  statement. 


2.2  faints  a£  Discussion 

2.2.1  Generalized  Correlation;  Norm  YS  Dot  Product 


The  norm  and  the  dot  product  do  not  give  the  same  result 
when  used  as  a  measure  of  correlation.  A  simple  example  can 
demonstrate  that  idea.  The  templates  for  the  two  classes 


might  be 


T(l)  *  (5,2,1) 

T<2)  *  (3,1,2)  (1> 

and  the  unknown  is 

ON  *  (4,1,1. 5)  (2) 

Notice  that  the  vectors  are  3  dimensional  and  that  they  can 
be  considered  intensities  at  3  pixels.  The  templates  could 
represent  a  light  gradient  that  fades  off  to  the  right  and 
one  that  fades  then  brightens  again.  These  patterns  would  be 
easily  recognizable  by  a  human.  The  unknown  would  be  an 
element  of  the  second  class.  The  results  of  the  norm  and  the 
dot  products  between  the  unknown  and  the  templates  are 

I IUN  -  T (1)  I  I  =  2.25 

NUN  -  T ( 2)  I  I  =  1.25  * 

<UN,  T (1)  >  =  23.5  * 

<UN,  T(2)  >  =  17  (3) 

The  norm  decision  is  to  pick  T(2)  but  the  dot  product 
decision  is  to  pick  T(l).  Notice  that  the  minimum  norm 
(difference)  and  the  maximum  dot  product  (alignment)  are 
measures  of  likeness.  The  first  thought  is  to  normalize  the 
vectors  since  that  will  make  the  dot  product  exactly  a 
measure  of  alignment.  The  results  are  then 

I INT(1)  -  NUNI I  =  0.2102  * 

I  I  NT (2)  -  NUNI I  ■  0.2252 

<NT(1) ,  NUN>  =  0.9779  * 

<NT(2) ,  NUN>  »  0.9746  (4) 


7 


where  the  N  indicates  the  vectors  have  been  normalized.  The 
decisions  are  now  consistent  with  each  other  but  not  what  was 
intended.  The  norm  and  dot  product  correlations  here  are  at 
best  inconclusive. 

This  example  points  out  some  problems  with  generalized 
correlations: 

(1)  Certain  components  may  dominate  the  generalized 
correlation. 

(2)  The  norm  and  dot  product  correlations  may  not  give  the 
same  results. 

(3)  The  norm  of  the  individual  vectors  can  be  important  in 
the  decision  making  (if  the  vectors  are  not  normalized) 
for  norm  correlation  (relates  to  1). 

(4)  Alignment  is  dominated  by  the  large  values  (relates  to 
1)  . 

Note  that  a  problem  is  already  developing*  there  is  a 
discrepency  between  the  results  of  the  generalized 
correlations  and  the  intended  results.  This  is  discussed  in 
more  detail  in  section  2.5.  The  purpose  of  this  discussion 
is  not  to  select  either  the  norm  or  the  dot  product 
correlations  but  to  motivate  the  investigation  of  both.  The 
relative  merit  of  each  will  be  discussed  later. 

2.2.2  Transform 

There  has  been  a  lot  of  effort  to  find  a  scene  analysis 
algorithm  via  the  use  of  the  DPT.  Certian  characteristics  of 
the  human  visual  process  suggest  that  such  a  transform  might 
be  used  by  humans  (ref:3).  There  is  no  mathematical  proof 
that  the  DPT  is  optimum  or  even  necessary.  The  discussions 
in  2.3  and  2.4  consider  both  the  DFT  and  a  general  transform. 
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The  DFT  is  a  pure  rotation  in  complex  space.  The  general 
will  also  be  confined  to  be  a  pure  rotation.  That  is  done  to 
maintain  the  same  structure  as  we  are  presently  using.  Later 
in  the  chapter  we  will  loosen  that  restriction. 


2.2.3  Preprocessing 

The  two  steps  identified  do  not  completely  define  the 
problem.  Preprocessing  may  include  many  things,  such  as 
clutter  supression,  edge  enhancement,  or  any  number  of  other 
things.  That  is  often  the  desired  result  of  filtering  in  the 
frequency  domain  (ref:2).  The  particular  preprocessing 
referred  to  consist  of  using  a  threshold  algorithm  to  black 
out  the  background,  use  a  first  order  moment  to  center  the 
letter  (for  identification  only)  and  normalizing  the  letters. 
The  normalization  might  consist  of  setting  all  non-zero 
(others  set  to  zero  by  the  threshold  algorithm)  components  to 
one  or  it  might  set  the  magnitude  of  the  N2  dimensional 
vector  to  one.  Preprocessing  is  not  considered  in  this 
development,  but  is  discussed  in  section  2.7.  It  may  seem 
strange  that  preprocessing  is  not  included  but  recall  that 
the  first  effort  will  be  to  formulate  a  solvable  problem. 
Preprocessing  can  only  be  indirectly  measured  by  the  cost 
function  and  adds  tremendously  to  the  complexity.  For  now, 
we  will  assume  that  the  preparation  has  enhanced  letter 
identification  as  much  as  possible. 


2.2.4  Non-Linear  Filters 

The  algorithms  discussed  in  this  chapter  are  either  norm 


or  dot  product  correlations  of  linear  functions.  There  is 
nothing  that  says  that  a  linear  algorithm  will  be  able  to 
resolve  the  scene . alalysis  problem,  in  fact  the  solution  may 
require  a  much  higher  order  solution.  We  start  out 
considering  a  linear  problem  and  later  generalize  to  higher 
order  systems. 

2.2.5  Notation 

Throughout  the  rest  of  this  chapter  notation  is  used  to 
try  and  seperate  the  various  vectors,  cost  functions  and 
generalized  correlations  used.  Two  main  catagories  of 
generalized  correlations  are  considered.  The  first  is  the 
dot  product  correlation  and  is  represented  by  CDP.  The 
second  is  norm  correlation  and  is  represented  by  CN.  The 
letters  p  and  t  are  reserved  to  indicate  classes,  there  are  T 
classes.  There  are  C  samples  in  each  class  and  the  letters  c 
and  r  are  sample  indicies  and  are  represented  as  arguments. 

The  generalized  correlations  return  T  values,  one  for 
each  class  of  interest.  Those  values  can  be  considered 
components  of  a  T  dimensional  vector.  The  indicies 

corresponding  to  the  components  of  the  vector  are  expressed 

.  4-  th 

as  superscripts  or  subscripts.  For  example  ON*-  is  the  t 

component  of  a  T  dimensional  vector,  and  Ikl  is  the  (k,l)t^1 

element  of  a  dimensional  image  vector. 

Indicxal  summation  notation  is  used.  That  notation 
implies  summation  over  indicies  when  the  same  letter  is  used 
as  superscripts  and  subscripts.  For  example 
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(5) 


Ikl<P,r>  Fkl<t)  =  E  E  Ikl<P,r)  Fkl(t) 

k  1 

Summations  over  arguments  are  still  explicitly  expressed. 
The  letters  i,  j,  k,  1,  m  and  n  are  summed  from  1  to  N  unless 
otherwise  specified. 

There  are  two  stages  to  calculating  the  cost  for  the 
optimization  problem.  First  the  unknown  vector  is  compared 
with  the  T  templates.  The  second  step  is  to  combine  the  T 
values  returned  (referred  to  as  the  dot  product  or  norm 
correlation  values)  in  to  a  single  cost. 

2.3  Indentification 

In  section  2.3.1  we  will  set  up  the  problem  to  optimize 
the  frequency  filter  used  when  approaching  identification  via 
the  DFT.  That  can  be  expressed  mathematically  as 

CN(t)  =  IIFFUN  -  FFAT(t)!!2  (6) 
where  CN  is  the  norm  correlation,  FFUN  is  the  filtered  DFT  of 
the  unknown  vector  and  FFAT(t)  is  the  filtered  DFT  of  the 
template  for  the  tth  class  of  vectors.  Note  that  the 
generalized  correlation  can  be  done  in  the  transform  space 
since  the  transform  is  a  pure  rotation  (see  the  Appendix). 
The  dot  product  correlation  is 

CDP(t)  *=  .  <FFUN,  FFAT(t)  >  (7) 
where  the  brackets  indicate  a  dot  product. 

The  identity  of  the  letter  is  chosen  as  the  t 
corresponding  to  the  smallest  value  of  CN(t)  or/and  the 
largest  value  of  CDP(t).  The  norm  correlation  can  be 
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expressed  in  terms  of  one  of  the  test  letters,  that  is  • 

CNfc(r,p)  =  llFFI(r,p)  -  FFAT(t)  I  I  (8) 

where  FFI(r,p)  is  the  filtered  DFT  of  the  pfc^  element  of  the 
rth  class  of  objects  and  CN(t,r,p)  is  the  norm  correlation  of 
that  vector  with  the  tfc^  template.  The  dot  product 
correlation  can  be  written 

CDP(t,r,p)  =  <FFI(r,p) ,  FFAT(t) >  (9) 

where  CDP(t,r,p)  is  the  dot  product  correlation  value  of  the 
pfc^  element  of  the  r*^  class  of  templates  with  the  t*"*1 
template. 

Now,  one  cost  for  the  norm  correlation  could  be 


COSTN  =  5J  £  CN  ( t ,  t ,  r ) 
t=l  r=l 


+ 


(1  _  g  ) _ 2 _ 

rt  CN(t,r,p) 


(10) 


where  is  the  Kronecker  delta  function  and  COSTN  is  the 
cost  of  the  norm  correlation.  Note  that  the  terms  in  the 
first  sum  are  just  the  norm  correlation  of  the  elements  of 
each  class  with  the  template  for  the  class  to  which  they 
belong.  Those  are  the  norm  correlation  values  we  want  small. 
The  terms  in  the  second  sum  are  inverse  of  all  other  norm 
correlation  values,  correlation  values  that  we  want  large. 
That  means  that  we  want  the  second  sum  small  as  well. 
Therefore  minimizing  this  cost  functional  will  give  u s  an 
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optimal  solution  in  which  we  might  be  interested. 

A  cost  function  for  the  dot  product  correlation  could  be 

COSTDP  =  >!  T"1  - - - 

£ri  CDP  (t,  t/  r) 

T  T  C 

+  TJ  Ti  T2  (1  -  )CDP(t,p,r)  (11) 
t=l  p2l  r=l  Pfc 

We  want  the  dot  product  correlation  values  in  the  first  sum 
large  and  in  the  second  sum  small.  Again  we  want  to  minimize 
this  cost  functional  to  get  an  optimal  solution. 

2.3.1  Optimization 

A  cost  function  has  been  developed.  Generalized 
correlation  values  are  to  be  used  in  calculating  the  cost. 
Expressions  for  the  norm  and  dot  product  correlations  are 
given  in  equations  (8)  and  (9).  This  section  will  expand  and 
rearrange  those  expressions.  What  we  will  see  is,  it  may  be 
reasonable  to  combine  the  transform,  the  filter  and  template 
so  that  all  can  be  simultaneously  optimized.  First  we  will 
expand  the  norm  correlation.  That  is 

N  N  _n  0 

CN(t,p,r>  =  T7  C  1FFI  <r,P>  -  FFAT  (t)l  (12) 

m=jL  n=l 

where  the  norm  in  equation  (8)  has  been  expanded  into  a  sum 
of  squares  of  the  components.  The  filtered  DFT  of  the  image 
and  template  can  be  expanded  by  substituting 

FFl“u,p)  -  F(inn)  FC(mn)kl  I^Vr)  (13, 

and  (the  pararenthesis  specifies  no  summation) 
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_mn  . , . 
FFAT  (t) 


ATkl(fc) 


(14) 


F  FC 
(mn)  (mn)kl 

where  F  is  the  frequency  filter  to  optimize,  I(p,r)  is  the 

r  element  from  the  p  class  of  images,  FC  is  the  array  of 

DFT  coefficients  (see  the  Appendix)  and  AT(t)  is  the  template 
t  h 

for  the  t  class  of  images.  Then 

N  N  mn  I,  i  if  i 

FCN(t,p,r)  =  XL  lF  FC  ,,,(1  (p,r)  -  AT  (t))l 

m=l  rv=jl  mnxi 

x(Fmn  FC  ..(Ilj(p,r>  -  ATkl(t)))*  (15) 
mmg 

where  the  summation  over  m  and  n  is  over  all  four  terms 
simultaneously.  The  complex  conjugate  can  be  taken  inside 
the  brackets  (recall  the  image,  template  and  filter  are 
real),  then  the  equation  can  be  rewritten 


CN(t'p'c’ =  S  U  <F"n’2  PC-*1 


FC  .  .* 
mmj 


x((Ikl(p,r)  -  ATkl(t)> (Ilj(p,r)  -  ATlj(t))J  (16) 


The  term  in  the  square  brackets  can  be  expanded  and  the 
summations  carried  over  each  term  individually  to  give 

CN(t,p,r)  =  ((Fmn>2  FC  FC  ..*)  Ikl(p,r)Iij(p,r) 
c  mnkl  mm3  .  • 

-  2[(Fmn)2FC  .  _  FC  .  .*ATlj  (t)  J  Ikl(p,r) 
mnkl  mm3 

+l(Fmn)2  FC  FC  . .*  ATkl(t)  ATlj(t)]  (17) 
mnkl  mni] 

From  this  formulation  it  is  apparent  that  the  norm 
correlation  is  a  second  order  correlation  in  the  data  (the 
data  is  the  unknown  image).  Assuming  for  the  moment  that  the 
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optimum  filter  is  known,  the  summations  inside  the  brackets 
of  equation  (17)  are  completely  determined  and  can  be 
replaced  by  single  arrays.  That  is 

kl  i-i 

CN(t,p,r)  =  Klkl_<t)  I  (p, r)  I  J(p,r) 

kl 

+  K2  , (t)  X  (p, r )  +  K3(t)  (18) 

kl 

where  the  coefficients  Kl,  K2(t)  and  K3(t)  are  determined  by 
the  summations  in  the  square  brackets  of  equation  (17). 

The  dot  product  can  be  expanded  by  substituting 
equations  (13)  and  (14)  into  (9),  to  give 


N  N 


CDP ( t , r ,p)  =  £  £  IF  FC  ikl(P,r)) 

iPl  n=l  mn  mnkl 


xtPmn  FC  . .*  ATlj(p,r)) 
mnx] 


(19) 


The  summations  can  be  interchanged  to  give 


N  N  __  5  ... 

CDP(t,p,r)  =  £  £  f(Fmn)Z  FC  ,  *  AT1J(t)J 

m^l  n^l  mnkl 

x  I  (p,r) 


(20) 


Again  the  term  in  the  brackets  is  completely  determined, 
assuming  the  optimum  filter  has  been  found,  and  can  be 
replaced  with  a  single  array.  We  get 

CDP(t,p,r)  =  K4kl(t)  Ikl(p,r)  (21) 

Clearly  this  is  a  first  order  equation  in  the  data. 

Two  important  things  are  brought  out  by  this  analysis. 
The  first  is  that  the  norm  correlation  is  second  order  while 
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the  dot  product  correlation  is  first  order.  That  would  lead 
one  to  expect  the  norm  correlation  to  have  more  potential 
than  the  dot  product  correlation.  The  second  thing  to  note 


is  that  it  may  be  as  simple  to  optimize  all  four  arrays  Kl, 
K2(t),  K3(t)  and  K4(t)  as  it  is  to  optimize  just  the  filter. 

2.3.2  General  Optimization 

The  last  section  points  out  that  equations  (16)  and  (19) 
could  be  used  to  optimize  the  templates,  filters  and  the 
transform  coefficients.  Since  they  are  multiplied  it  is 
equivalent  to  optimizing  the  combined  arrays  given  in 
equations  (18)  and  (21). 

The  two  cost  functions  developed  are  exactly  the 
opposite.  Forthegeneral  optimization  it  is  convenient  to 
eliminate  one.  The  question  we  need  to  resolve  is  which  to 
use.  It  turns  out  that  that  is  not  difficult. 

Wechoose  the  dot  product  type  of  cost  function.  The 
reason  is  because  a  difference  between  2  vectors  must  be 
small  for  the  norm  type  of  correlation  function  to  work. 
That  would  mean  that  a  single  correlation  function  that  had 
input  2  vectors  where  the  second  is  the  first  with  a  constant 
added  to  each  component,  cannot  possibly  return  a  small  value 
in  both  cases.  The  dot  product  correlation  does  not  suffer 
from  that  problem. 

The  generalized  optimization  problem  is  then 


COST(t,p, r) 


_ A _ 

CgT  r ,  r ,  p) 


(22) 


+  C  E  E  (1-$rt  )CG(t,r,p) 

p=l  r  =  l  t=l  rt 

and  the  generalized  correlation  is  given  by 

if  1 

ClG(t / r ,p)  =  KlR1(t)  I  (p,r)  +K2(t)  (23) 

for  a  first  order  transformation  and 

C2G(t,p,r)  =  K3kli.(t)  Ikl(p,r)  Ilj(p,r) 

J  kl 

+  K4PR1 ( t )  I  (p,  r)  +  K5(t)  (24) 

for  the  second  order  system.  Notice  that  this  could  be 
carried  to  any  order  system. 

An  interesting  note  is  that  optimizing  a  frequency 

filter  for  dot  product  identification  is  a  restriction  of  the 

more  general  problem  of  optimizing  the  templates  in  the 

spatial  domain.  Equation  (23)  is  just  a  dot  product  in  the 

spatial  domain  with  a  constant  added.  The  constant  will  have 

a  material  effect  on  the  decision  and  cannot  be  assumed  to  be 

kl 

zero.  The  constant  arrays  Kl  (t)  can  then  be  viewed  as 
optimum  templates.  We  showed  in  the  last  section  that  the 
dot  product  correlation  generalized  to  this  form. 

2.4  Lb.giltJ.aQ 

All  the  ideas  developed  in  the  last  section  can  be 
applied  to  the  location  problem  by  simply  stating  the  problem 
in  the  same  format.  We  will  do  that  in  this  introduction  and 
the  remainder  of  the  development  will  be  straight  forward. 

The  classes  have  to  be  established.  To  do  that  we  need 
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only  find  the  number  of  possible  locations  in  the  image  and 
establish  T'  classes.  We  then  pick  out  C'  samples  for  each 
class.  We  now  have  the  saples  we  need  and  will  be  able  to 
use  the  same  equations  as  developed  in  section  2.3. 

2.4.1  Optimal  Ziitsr. 

Everything  stays  the  .same  as  last  section  only  the 
classes  have  been  changed.  The  equations  won't  be  duplicated 
here.  The  key  concept  to  note  is  that  the  procedure  is 
exactly  the  same  in  either  case  only  the  data  used  to  produce 
the  optimal  filter  will  change. 

2.4.1  General  Optimization 

Again  the  equations  that  we  use  are  exactly  the  same  as 
for  identification,  only  the  sample  set  has  been  changed.  We 
can  use  various  ansformations,  first,  second,  third  order 
and  so  on. 

2.5  Alternate  Point  af  View 

The  motivation  for  the  seperation  of  location  and 
identification  can  be  put  on  firmer  ground.  We  might  attempt 
to  form  T  x  T'  classes,  where  T  is  the  number  of  objects  of 
interest  and  T'  is  the  number  of  possible  locations.  That 
would  give  optimum  location  and  identification  simultaneously 
except  that  the  previous  dimensions  of  the  output  vector  were 
T  and  T',  the  dimension  of  this  new  problem  is  TxT'.  That 
makes  the  whole  problem  more  difficult  by  just  increasing  the 
dimension.  The  other  problem  is  that  we  may  not  be  getting 
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what  we  want.  First  if  we  are  getting  what  we  want.  First 
if  we  are  viewing  a  whole  page  of  text  simultaneously  that 
will  make  the  number  of  locations  extremely  high  and  secondly 
there  will  be  multiple  locations  for  the  same  letters.  That 
cannot  be  directly  approached  by  this  problem  formulation. 
In  any  case  what  we  probably  want  to  do  is  look  at  a 
relatively  small  area,  locate  a  letter  and  then  determine  the 
identity  of  that  letter.  That  specifically  requires  a 
seperate  location  and  identification  algorithm. 

The  results  of  the  last  two  sections  suggest  a  different 
way  of  looking  at  the  scene  analysis  problem.  Previously  we 
transformed  the  data,  filtered  the  data  and  then  did  a 
generalized  correlation  with  some  template.  That  approach  is 
to  theorize  about  a  system,  develop  that  system  and  then 
check  to  see  if  the  results  are  what  is  desired. 

We  found  in  the  last  two  sections  that  the  above 
approach  reduces  to  first  and  second  order  transformations. 
Since  optimizing  the  first  and  second  order  transformations 
is  equivalent  to  optimizing  the  template,  filter  and 
transformation.  That  suggest  a  new  point  of  veiw.  We  have 
established  a  data  space.  We  have  established  a  data  space. 
We  can  also  establish  a  destination  space.  That  space 
contains  the  results  of  interest  to  us.  We  then  optimize  an 
Rth  order  transformation  between  the  two.  To  do  that  we  make 
use  of  samples  from  the  data  space  whose  analysis  is  known. 
The  analysis  is  done  by  a  human. 

The  key  thing  to  realize  is  that  in  the  original  point 
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of  view  we  were  attempting  to  develop  a  transformation  based 
on  our  knowledge  of  the  data  space,  but  in  the  new  point  of 
view  we  are  only  concerned  that  the  results  of  the 
transformation  be  of  use  to  us.  In  other  words,  the  only 
structure  that  we  put  on  the  transformation  is  the  order,  and 
that  is  only  to  make  the  optimization  manageable. 

2.6  Probablistic  Point  ol  View 

This  section  breifly  discusses  the  problem  from  a 
probablistic  point  of  view.  There  are  some  valuable  insights 
that  can  be  developed  from  this  point  of  view. 

The  first  step  in  laying  out  the  problem  from  the 
probablistic  point  of  view  is  to  establish  a  sample  space. 
The  events  that  are  of  interest  are  physical  realizations  of 
hand  printwd  uppercase  letters.  A  sample  space  1  can  be 
defined  that  consist  of  all  hand  printed  uppercase  letters. 
Each  of  these  letters,  when  digitized,  can  result  in  many 
different  N2-tuples  of  numbers.  The  N2-tuple  of  numbers  is 
the  result  of  digitizing  the  light  intensity  at  each 
intersection  of  a  N  by  N  grid  of  lines  covering  the  letter. 
The  noise  associated  with  the  digitization  can  be  taken  to  be 
elements  second  sample  space-f^*  Those  spaces  are 

Al  =  jX) :  Oj  is  a  hand  printed  uppercase  letter} 

W  is  a  sample  of  the  noise  of  digitization^  (25) 

The  total  sample  space  can  be  taken  to  be  the  cartesion 
product  of  those  two  spaces 

XI  x-^-2  ( 26) 
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The  assumption  made  here  is  that  the  subspace iT 2  c6n  *3e 
ignored  because  its  effect  is  relatively  small.  That 
assumption  can  be  translated  into  more  physical  terms.  Given 
a  handprinted  uppercase  letter  and  given  the  digitization  of 
that  letter,  the  displayed  digitized  letter  can  easily  be 
recognized  as  the  handprinted  letter  itself.  This  is  not  in 
general  true  for  all  digitized  data  but  we  will  assume  that 
the  digitizer  being  used  is  of  high  quality. 

The  probability  space  triple  can  be  set  up  as 


<^J,P,>  (27) 

where^  is  the  sample  space  discribed,  ^ is  the  collection 
of  sets  on  that  space  that  are  measureable  and  form  a  sigma 
algebra,  and  p  is  the  probability  measure  on  those  sets.  The 
sigma-algebra  can  be  taken  to  be  all  sets  made  up  of  a  single 
letter,  all  unions  of  those  sets,  the  null  set  and  the  entire 
space.  The  probility  triple  has  been  completely  defined  and 
the  analysis  can  proceed. 

The  next  step  is  to  define  a  random  vector  that  maps  an 

element  of  into  R  .  That  random  vector  is  represented  by 

the  sequence  x.(*)  where  x.(')  is  the  random  variable 

1  i 

+•  h  N 

associated  with  the  1  cooridinate  of  the  vector  in  R 

kl 

(equivalent  to  I  (p,r)  where  i=(k-l)N+l).  The  sample  space 
is  divided  into  26  classes,  and  those  classes  are  defined 
so  that  each  includes  all  letters  of  one  type.  That  is 


=  (w:  w  is  the  result  of  handprinting  an  A) 

C2  =  (w:  w  is  the  result  of  handprinting  an  B)  (28) 
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and  so  on.  Thc>  element  of  Cj  are  referred  to  as  w(i,*).  One 

final  set  of  vectors  are  needed.  The  vectors  are  the  vectors 

26 

of  interest  in  the  destination  space  R  .  We  want  members  of 
Ci  to  map  into  the  vector  S-*(i)  where  the  argument  refers  to 
the  class  to  which  the  vector  corresponds  and  the  argument  j 
refers  to  the  component  of  the  vector.  We  set 

Sj(i)  =  S±j  (29) 

where  . .  is  the  Kronecker  delta  function.  The  mapping  from 
N2 

R  to  R  can  be  of  any  order  but  for  the  time  being  is  taken 

to  be  linear.  The  mapping  is  represented  by  an  array,  F^(j) 

2 

where  i  ranges  from  1  to  N  and  j  ranges  from  1  to  26. 

The  mapping  is  then 

y(jr  *>  -  F.(j)xi(j)  (30) 

where  y(j,')  is  a  random  vector  in  the  destination  space  and 
s1C.)  is  a  random  vector  in  the  data  space. 

The  cost  function  (similiar  to  equation  (22))  is 


COST 


=  4 


y(p,w(p, *) ) 


T 

E 

k=l 


y (k ,w(p,  * ) )  (l~Spk  )j 


(31) 


where  the  expectation  is  over  all  omega.  Any  number  of  cost 
functions  might  work,  this  one  clearly  measures  what  we  want. 
It  maximizes  the  component  of  the  destination  vector  that 
corresponds  to  the  true  identity  (first  term)  and  minimizes 
all  other  components.  In  the  next  section  we  consider  a 
quadratic  cost  function  so  that  we  can  synthesize  a  solution. 

The  probabilistic  statement  can  be  reduced  to  the 


Si 
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deterministic  statement  as  follows.  Take  .  to  consist  of  T 

1 

classes  of  C  sample  vectors  each.  Elements  of  C  will  be 
denoted  w(t,c)  where  t  indicates  the  class  and  c  indicates 
which  element  of  that  class.  We  then  make  the  assumption 
that  the  probability  of  occurance  of  each  w(t,c)  is  (CT)  1 , 
that  is  each  element  has  an  equal  probability  of  occurance. 
The  expectation  can  then  be  expressed  as  a  sum,  or 


COST  = 


?^y  (t,w(t,c) ) 


+ 


T 

y (k,w(t,c) ) (1- 

k=l 


(32) 


the  summations  can  be  rearranged  to  get 


T 

-I 
mJL 

and  that  is  the  same  as  equation  (22)  except  it  has  the 
additional  constant  factor  (CT)  \ 

The  linear  condition  on  the  mapping  is  very  limiting. 
It  means  that  the  decisions  are  based  on  the  value  of 
individual  components  of  the  random  vector  compared  with  that 
same  component  of  other  vectors.  The  problem  is  that  no 
cross  correlations  between  various  components  of  a  single 
random  variable  can  be  used.  The  following  paragraph  will 
attempt  to  clarify  why  that  severely  limits  us  and  why  higher 
order  solutions  might  be  of  use. 

First  we  will  attempt  to  compare  an  0  and  Q  using 
a  linear  mapping.  For  simplicity  of  discussion  we  will  assume 
the  letters  have  been  prepared  by  a  threshold  device  that  has 


C0ST  ct  c5l  SiyTtTwTtTcTT  +ct 


T 

Eicl-‘ 


kt 


)  y  (k,w(t,c) ) 
(33) 
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set  all  values  to  either  zero  or  one.  We  consider  a  single 
component  of  the  random  variable.  The  probability  that  that 
component  will  be  1  may  be  the  same  for  all  classes.  If  that 
is  so  there  will  be  absolutely  no  information  of  value  in 
that  pixel.  At  the  other  extreme,  there  may  be  a  pixel  that 
is  always  1  for  a  given  class  and  always  0  for  all  other 
classes.  In  that  case  that  pixel  would  contain  all  the 
necessary  information  to  identify  the  single  class  whose 
value  is  1.  How  can  that  be  applied  to  an  0  and  a  Q.  One 
would  expect  that  the  only  reliable  difference  would  be  on 
the  lower  right  side.  To  identify  the  Q  one  might  expect 
that  some  points  toward  the  center  from  the  circle  and  some 
toward  the  lower  right  corner  from  the  circle  might  be  used 
(see  figure  la).  It  might  then  be  possible  that  a  0  that  was 
not  quite  on  center  would  be  bright  at  some  of  these  points. 
What  we  would  like  is  to  require  that  points  on  a  diagonal 
and  some  distance  apart  should  be  simultaneously  large,  but 
that  requires  a  second  order  mapping  (see  figure  lb).  We 
would  require  that  a  product  of  pixels  connected  by  the  lines 
in  figure  16b  be  large  for  the  letter  to  be  a  Q.  The  decision 
would  be  such  that  a  circle  would  make  the  letter  an  0  with  a 
Q,  say,  30%  lower  and  then  if  the  diagonal  line  exsist  it 
will  push  the  Q  correlation  higher  than  the  0  correlation. 

The  value  of  the  higher  order  system  is  apparent.  A 
third  order  system  can  look  for  curves,  straight  lines, 
intersections  or  continuity.  Continuity  of  curves  might  be 
useful  in  distinguishing  between  an  0  and  a  C  for  example. 
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Characteristics  of  0  and  0 


a.  First  Order  Pixels 


4  The  extension  to  second  order  systems  is  discussed  in 

section  2.3.2.  The  cost  function  will  remain  the  same,  only 
the  transformation  will  change. 

Two  important  concepts  have  been  discussed  in  this 
section.  First,  reasonable  assumptions  takes  us  directly 
from  ti  e  probablistic  formulation  to  the  deterministic 
formulation.  The  second  point  is  that  a  linear  system 
severely  limits  the  decision  making  ability  of  an  algorithm. 

2.7  h  Solution  lo  Isolated  Let.ter  Id.cLtitlliiiatlojQ 

The  large  dimensions  of  the  problem  and  the  non¬ 
quadratic  cost  function  make  the  synthesis  of  a  solution  very 
difficult.  It  may  be  useful  to  simplify  the  problem 
statement  one  step  further  to  have  a  synthesizeable  solution. 
We  do  that  in  this  section.  There  is  a  compromise,  we  need  a 
quadratic  cost  function.  The  compromise  will  be  discussed  in 
the  next  section. 

2.7.1  .Quadialio  £os.fc  Funslion 

The  vector  we  want  to  optimize  is  y(t,w(p,c))  where 
w(p,c)  is  the  cth  element  of  the  tth  class  of  vectors.  What 
we  really  want  is  all  vectors  of  the  form  y(t,w(t,c))  to  be 
large  and  all  elements  of  the  form  y(t,w(p,c))  ,t/p  to  be 
small.  That  is  well  represented  by  the  cost  function  given 
in  equation  (22).  As  an  alternate  we  might  require  that  the 
vectors  of  the  form  y(t,w(t,c))  to  be  close  to  1.  That  is  a 
compromise  since  that  is  requiring  that  two  vectors  that  only 
differ  by  a  constant  both  map  to  one,  and  that  uses  up  cost 
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on  a  objective  in  which  we  really  have  no  interest.  None- 
the-less  it  does  result  in  a  solvable  problem.  That 
compromise  might  not  be  bad  if  the  images  are  "normalized". 
The  quadratic  cost  function  is 


C0ST  '  TC  P4  £  [y<t'W(E>'C”-Stp)2a+BStp>  <34> 
This  cost  function  will  drive  vectors  of  the  form  y(t,w(t,c)) 
to  1  and  vectors  of  the  form  y(t,w(p,c)) , t^p  to  zero.  It 
will  put  a  weighting  of  1+B  on  driving  vectors  of  the  form 
y(t,w(p,c))  toward  1. 

2.7.2  Expansion  n£  Cost  Function 

In  order  to  get  the  solution  to  the  equation  we  must 
first  expand  the  cost  in  terms  of  the  transform  coeffiients, 
F.  That  is  we  substitute 

y(t,w(p,c))  =  Fkl(t) lkl(p,c)  (35) 
into  the  cost  function,  equation  (34).  That  gives  us 


T  T  C  r  7 

COST=(T^“1C  £  £  IF  (t)Ikl(p,c)-$Jj2a+BS ;  )  (36) 

t=l  p=l  c=l  t  kl  fcP'  pfc 

expanding  the  square  and  carrying  over  the  last  terra  gives 
COST  =  A  £  i  £x  ^kl(t,Ikl(p.c)}{Fkl(t)Ikl(p.C)} 


"  tc  &  P4  4  pki<t,ikitp'c> ‘stP+B*tP2> 
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f'c  4  £  s£v2  +bv3^ 


tc  t4  £ 


T  T  C 


+  TC 


c  E  S  E  F2<t>2 11+BV 


~4A4,2<t,v™ 


+  “  E  £  E  FI  (t)Ikl(p,c)F2(t) 
TL  t=l  p=l  c=l  K1 


+  tc  4  Si  Si  nki<t,ikl<p’c,p2(t,sPt  (37> 


T  T  C 

FI.  .  (t)IKX(prC)F2(t)l 

p =1  c=i  Ki  Pfc 

The  summations  can  be  interchanged  and  the  delta  functions 
summed  over  to  give 


COST  =  £_  Fki<t,Fkl(t> 


t=l 


y  4  jiiki<p»c)ii3<p-oj 


+  T  Si  PW(t,f' 


iltlip,c)ilj<p.c)| 

2<r~  £  rki(t{5  4  lkl(p,c’} 


+  (1+B) 


+  F2(t)2(l+§) 
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y  F2  (t) 
T  t^l 


+2  £  Flvi (t)F2(fc) 


f  1  T  C  kl  1 

M  R  R  1  (p'c>) 


+  --  T"  FI,  , (t)F2(t) 
T  kl 


c4  ikl(t'°i 


The  following  substitutions  can  be  made 

^klij  -  5?  E  E  xkl<p,c)iij<p,c) 

TC  p^l  C=1 

^klij(t)  » ~  r  iklct,c)ilj(t,o 

C 


Mkl(t)  -  \  £  IK1(tfc) 

^  nss  1 


.kl  1 


T  C 


"  ’  *  &  &  1  (t" 


to  give 


T 

COST  =  T?  F.  .  (t)F.  .(t)5 

£^1  kl  13 


kli  j 


+  !£1Fia(t)Vt>jkU3<t> 
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+  ^  Fkl<t)Mkl(t)  +  ( 1+B) 

+  T±§  £  F2(t)2 

1  t=l 

"  E  F2(t) 

T  +■= i 


+  2 


T 

E 


FI  ( t)  F2  ( t) 
kl 


^Mkl 


~  Mkl(t) 
T 


* 


(43) 


can  be  interpreted  to  be  the  correlation  value  of  the 

pixel  intensity  at  k,l  with  that  at  i, j  overall  elements  of 

k  1 1.  ~i 

the  sample  space.  J(t)  can  be  interpreted  to  be  the 

correlation  value  of  the  pixel  intensities  at  k,l  and  at  i,j 

kl 

over  only  the  elements  in  class  t.  In  the  same  way  M  (t) 
can  be  considered  to  be  he  mean  of  the  k,l  pixel  intensity 
over  the  t*'*1  class. 


2.7.3  jte&LC-tian  £a  LiO£&£  Equations 

Necessary  and  sufficient  conditions  for  a  quadratic  cost 
function  to  reach  a  minimum  is 


4cost_ 

5Frs7x) 


for  r=l,N;  s=l/N;  x=l ,T 


dcOST 

dF2(z) 

and 


for  z=l ,  T 


(44) 
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^--7"T2  >  0  foc  a11  r's'x 
dt’rs(x) 


blcos T 
Jf27zT 


2  >  0  for  all  z 


The  derivatives  can  be  easily  evaluated. 


(45) 


<)  COST  ^  T+B 2 (1+B) 

■  2~t“  p2(t>  t  " 

+  2  Flkl(t)F2(t)  ^Mkl  +  -®Mkl 


2  (_1+B) 
T 


MrS(x) 


(46) 


and  the  second  derivative  is 


^COST__ 

<)  Fl  (X? 
rs 


«  2  £ 


rsrs 


+ 


2b  qp. 

T  ^rsrs 


(x) 


2  -  2-"B  (47) 
T 

The  nature  of  the  quantities  J  and  (x)  will  always  make 
them  greater  than  zero  so  we  need  only  solve  the  linear 
equation,  (45). 

The  second  order  mapping  will  also  reduce  to  a  set  of 
linear  equations  but  will  be  much  more  tedious  because  of  the 
large  dimension.  One  other  comment  should  be  made  about 
second  order  (or  higher)  mappings.  The  equation  discussed 
earlier  (equation  (24))  is  not  really  the  one  we  want  since 


32C0ST 

5f2Tz7 


Ikl(p,c)Iij(p,c)  =  lij(p,c)lkl(p,0) 


(48) 
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We  really  want  the  summations  on  the  quadratic  terms  to  go 
from  1=1, k;  j=l,l;  k  =  l,N  and  1  =  1, N.  That  summation  will 
exclude  all  duplicate  terms. 


4.8  Recommendations. 

The  recommendations  made  here  are  the  first  in  a 
continuing  study.  The  first  phase  is  directed  at  letters. 
The  intention  is  that  the  studies  recommended  here  will  lay 
the  groundwork  for  the  more  complex  scene  analysis  task.  The 
recommendations  are: 

4.8.1  Optimize  Isolated  Letter  Recognition 

This  problem  reduces  to  simplist  set  of  equations  and  is 
easiest  to  solve  numerically.  There  are  valuable  insights 
that  can  be  gained  from  this  problem.  In  particular,  the 
relative  performance  of  first,  second  and  maybe  even  third 
order  filters  can  be  investigated.  There  is  also  the 
possibility  of  studying  the  affect  of  preprocessing. 

4.8.2  OpiliQizfi  £Ji£  Location  and  Identification  <a£  XexJt 
La.tt.exs 

The  programming  to  solve  this  problem  should  be 
available  from  the  solution  above.  A  slight  modification 
might  be  necessary  to  accommadate  for  the  change  in  dimension 
of  the  destination  space  for  the  location  problem.  This 
problem  is  different  because  the  transformation  also  has  to 
filter  the  useless  information  contained  in  the  adjacent 
letters.  It  is  intended  that  the  optimum  algorithm  itself 
will  do  that. 


The  problem  assumes  that  we  have  some  idea  of  the 
location  that  we  are  interested  in  identifying,  for  example, 
one  might  locate  the  lines  of  print  using  some  conventional 
means  of  scene  analysis  and  then  assume  the  first  letter  was 
at  the  left  end  of  the  line.  After  the  first  letter  was 
identified  the  location  of  the  second  letter  might  be 
estimated  as  being  a  predetermined  distance  to  the  right  of 
the  last  center.  This  would  proceed  until  the  end  of  the 
line  was  reached  and  then  the  next  line  would  be  read. 


III. 


3.1  Norm  Correlation 

The  norm  correlation  (or  other  differencing 
correlations)  may  use  much  of  the  available  cost  attempting 
to  normalize  vectors  of  different  lengths  but  pointed  in  the 
same  direction.  They  may  also  spend  additional  cost  trying 
to  make  two  different  letters  from  the  same  class  alike 
rather  than  just  making  use  of  the  characteristics  that  make 
each  of  them  identifiable. 


3.2  Pot  Product  Correlation 

The  dot  product  correlation  does  not  have  the 
differencing  problem  that  the  norm  correlation  has,  but  it  is 
severely  limited  by  the  fact  that  it  is  only  a  first  order 
mapping.  The  first  order  mapping  is  severely  limiting 
because  it  cannot  require  the  intensity  at  two  pixels  to  be 
simultaneously  large. 

3.3  Discrete  Fourier.  .Transf&rm 

The  DFT  buys  nothing  in  this  problem  formulation.  The 
optimum  filter,  template  and  transform  is  included  in  the 
range  of  the  spatial  mappings  proposed. 


IV. 


The  quadratic  cost  function  is  a  compromise.  We  should 
investigate  the  non-quadratic  cost  function  and  attempt  to 
find  a  method  of  solution  for  those  very  large  systems. 

The  existence  and  uniqueness  of  solutions  for  both  the 
quadratic  and  non-quadratic  problem  formulations  should  be 
investigated . 

The  two  recommendations  above  are  needed  to  make  the 
theory  complete  enough  for  application.  There  is  another 
part  of  the  study  that  must  be  undertaken.  Now  that  we  have 
a  method  of  attack  the  engineering  must  be  done  so  that  this 
can  be  applied  to  practical  problems.  The  theoreticians  job 
is  to  propose  an  idealized  formulation  and  solution  for  a 
problem.  The  engineers  job  is  to  take  that  ideal  solution 
and  apply  it  to  a  practical  problem.  The  recommendation  is 
to  do  a  little  more  theoretical  development  before  turning 
the  problem  into  an  engineering  application.  Note  that  it  is 
still  useful  to  carry  out  the  task  recommended  in  section  2.8 
to  gain  preliminary  performance  insights. 
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Appendix 


The  two  dimensional  DFT  pair  (Ref  6)  is 


FI (m'+l ,n'+l)  = 


N  N  , 

=  L7  HI  H!  exp  -illm'k'+l'n'lKk'+l^'+l) 

N  k 1 =0  l'=0 

(49) 


N  N  > 

I  (m'  +  l  fn'+l)  =  K  HI  H  exp  i^/k'm'+l'n')  FI  (m'+l  /  n'+l) 
N  m'=l  n 1 =1 

(50) 

The  coefficients  in  the  transformation  can  be  represented 
by  array  elements.  That  is 

FC(m,n,k,l)  =  exp|-ijj}[  (m-1)  ( k— 1 )  +  (1-1)  (n-1)  ]|  (51) 

FC1 (k , 1 , m,n)  =  exp^C  (k-1)  (m-l)  +  (l-l)  (n-1)  (52) 

Then  the  DFT  is 

N  N 

FI (m, n)  -  r?  77  FC(m,n,k,l) I(k,l)  (53) 

N  k=l  1=1 


N  N 

I(k,l)  =  \  HI  Hj  FCI(m,n,k,l)FI(mfn) 
N  m=l  n=l 


In  the  text  we  refer  to  the  DFT  as  a  pure  rotation.  A 
pure  rotation  (Ref  5)  has  the  following  properties:  the  norm  of 
the  vector  (length)  has  to  be  the  same  in  the  new  coordinate 
space,  and  the  dot  product  of  two  vectors  must  remain  the  same 
in  the  new  coordinate  system.  Those  are  just  the  properties 
that  we  need  so  that  the  dot  product  and  norm  correlation  can 
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be  done  in  either  space  once  the  filter  has  been  applied. 
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